Sql-server – Indexing strategy for dynamic predicate

indexindex-tuningsql serversql-server-2012

Assume SQL Server 2012 Standard edition.

My database has a table with 500 million rows. The table has about a dozen columns, none of which are very wide (some varchar(100)'s and some ints).

The clustered index (also primary key) is an identity column.

The application using this table has a screen where the user can search on most of the columns. One search field on the screen, which is required, has the option to search starts with or contains, resulting in either

WHERE ABC LIKE 'something%' -- starts with 

or

WHERE ABC LIKE '%something%' -- contains

The actual queries are parameterized, unlike my examples here.

The other search fields do a starts with search just like the first example above, but they are not required. So, any combination of these fields can be searched on resulting in a dynamic where clause.

Given this information, what indexes should be created for optimal performance?

Bearing in mind that I'm new to query performance tuning, my naive strategy for this scenario alone is to create a non-clustered index for each column and using full text search for the column that has the contains search option. I'd love to hear why or why not that's a bad idea and what a better approach would be.

Update

It's known to me that full text searching is how to optimize the case of a "contains" search.

I'm much more interested in the other aspect of the problem: how to optimize for the other search fields which may or may not be present in any given query predicate. The details surrounding the field that can benefit from a full text index are included in my question only to help paint a more complete picture of my particular situation.

Best Answer

If I were you I would run a trace specific to hits on that table. It shouldn't be overly intensive since you are restricting it to just queries against that table, from your application. Run just the minimum needed by the DTA (Database Tuning Advisor). Run it for a day here, a day there, make sure you get some end of week days and some end of month days. Then run the whole lot through the DTA.

Here is why, I'm willing to bet that you have specific combinations of columns that are going to come up more often than not. You can create more complex indexes based on that information. You might also find that you can create some correlated statistics. Basically statistics that have more than one column. For example creating a statistic on City and State together may improve queries against those two columns.

However make sure you don't create to many indexes. On a table that large I'm guessing you do a fair number of writes and every additional index added will slow them down. Of course you may do most of your writes during a batch process.

Also make sure that you put an automatic process to update your statistics periodically. With that many rows the statistics aren't going to update on their own very often. Only once 500+20% of the rows have changed, in at 500 mil rows that's a LOT.