Sql-server – Best way to choose which columns to index to improve performance

indexperformancesql server

I use a MSSQL Database for my web application and now the user audit(USER_AUDIT) records in the application have crossed beyond 5 million records/rows. Every day the performance of the application is degrading and the queries for the audit table are pretty straight forward from the UI.

The default sorting on the data is on the TimeStamp i.e. the time when the record was created so I was wondering if adding an INDEX on the TimeStamp column would improve the performance of the application or because the the ID column(Incremental IDs) already has the INDEX on it, its sufficient and would not improve the performance any further.

Also sometimes the records need to be filtered using the username which is also one of the columns in the same table but because the user names are not unique I was wondering if it would help at all to add INDEX to that column as well.

The clustered index is on the ID column on the table – Would it be helpful if I change the Clustered INDEX to be on a different column

Or

Based on the above information would it be of any help to add Non-clustered Indexes to either TimeStamp or Username or both columns

I come from a programming background with very limited knowledge of the Database management so please pardon any missing information and please feel free to let me know if I need to add more information to the Question.

I would really appreciate it even if you could point me in the right direction.

Best Answer

Determining indexes for tables is not easy. I can offer some general guidelines that will help, but the specifics are going to be up to you to determine through testing on your systems with your queries.

The first, and most important, index you set is your clustered index. You only get one, so you want to make it right. Further, the key on your clustered index is also going to be the key for looking up all your non-clustered indexes. So, generally, you want to try satisfy three general criteria for your clustered index. First, the cluster should go on the column, or columns, that define the most commonly used path to the data. You don't say how the data is most commonly filtered above, so I can't make a recommendation based on that. Second, because the clustered index key is used for lookups in other indexes, as much as you can, you want it to be as small a data type as possible. You'll frequently see an ID column like you have because it satisfies this. However, don't get hung up on this. If a 'Name' column is the most frequently used path to the data, even though it's a wider key than an int, it can still be the clustered index. Further, the clustered index has to be unique. That's not to say the column must be unique. SQL Server can make the cluster unique internally if it has to. Yes, this adds overhead, so unique data is preferred, but again, driven by reality, not arbitrary rules.

Nonclustered indexes. Use as few as you can. However, create them as you need them. From the sounds of things, sorting on timestamp frequently, you might want an index there. However, this is where things get fun, that doesn't mean it will get used to sort the data. It depends on the other filtering criteria and how the data is being retrieved, which columns, the exact WHERE clause, etc. Same thing goes for creating a nonclustered index on Name. It could be a good choice. However, what other columns are being retrieved? As was already mentioned, if you add a nonclustered index, but it only satisfies the filter criteria for the query, you'll still have to go to the clustered index to look up the rest of the columns. This is fixed by either adding additional columns to the nonclustered key (although, only do that if they're also used for filtering), or by using the INCLUDE key word.

All of this has to be tested by looking at the execution plans for your query. That will show you which indexes are being used and how they are used. So plan on pulling that information out.

Yeah, this is an enormous topic and I've only scratched the surface here. I have a 900 page book on query tuning and a 600 page book on execution plans. There's that much to talk about.