Indexing from start or when performance problem arises

database-agnosticindexperformance

My question is regarding use of indexes.

  1. Should I start indexing right from the start or when performance problem arises?

  2. We can also create temporary index while executing a query. What are the pros and cons of such techniques?

Best Answer

Should I start indexing right from the start or when performance problem arises?

Indexing strategy tends to evolve as usage patterns emerge. That said, there are also strategies and design guidelines that can be applied up front.

  • Choose a good clustering key. You can usually determine the appropriate clustered index at design time, based on the expected pattern of inserts to a table. If a compelling case emerges for a change in the future, so be it.

  • Create your primary and other unique constraints. These will be enforced by unique indexes.

  • Create your foreign keys and associated non-clustered indexes. Foreign keys are your most frequently referenced join columns, so index them from the start.

  • Create indexes for any obviously highly selective queries. For query patterns you already know will be highly selective and likely to use lookups rather than scans.

Beyond the above, take a gradual and holistic approach to implementing new indexes. By holistic, I mean assess the potential benefit and impact to all queries and existing indexes when evaluating an addition.

A not uncommon problem in SQL Server circles is overindexing, as a result of guidance from the missing index DMVs and SSMS hints. Neither of these tools evaluate existing indexes and will merrily suggest you create a new 6 column index rather than add a single column to an existing 5 column index.

-- If you have this
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
)

-- But your query would benefit from the addition of a column
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

-- SSMS will suggest you create this instead
CREATE NONCLUSTERED INDEX [IX_MyTable_AnotherIndexWithTheSameColumnsAsTheExistingIndexPlusCol6] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

Kimberly Tripp has some excellent material on indexing strategy that while SQL focused is applicable to other platforms. For the SQL Server folk, there are some handy tools for identifying duplicates like the example above.

We can also create temporary index while executing a query. What are the pros and cons of such techniques?

This usually only applies for rarely run queries, typically ETL. You need to assess:

  1. Does the time taken to create the index reduce the execution time of the query.
  2. Does the maintenance overhead of leaving the index in place outweigh the time taken to create/drop when it's needed.