Sql-server – Unexpected Indexing performance (Everything’s equal)

computed-columnindexperformancequery-performancesql serversql-server-2008-r2

I wanted to see the effects of having indexes on calculated columns, so I created a table like so:

CREATE TABLE [Domain\UserName].[CompColIndexing](
    [a] [int] NOT NULL,
    [nonIndexedNonPersisted]  AS ([a]+(1)),
    [nonIndexedPersisted]  AS ([a]+(1)) PERSISTED,
    [IndexedNonPersisted]  AS ([a]+(1)),
    [IndexedPersisted]  AS ([a]+(1)) PERSISTED
) ON [DATA]

I've added 800,000 rows to this, with the value for a cycling through 0 to 9.

The following indexes were added:

CREATE NONCLUSTERED INDEX [IX_DJB_CompNonPersisted] ON [Domain\UserName].[CompColIndexing] 
(
    [IndexedNonPersisted] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [DATA]
GO

CREATE NONCLUSTERED INDEX [IX_DJB_CompPersisted] ON [Domain\UserName].[CompColIndexing] 
(
    [IndexedPersisted] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [DATA]
GO

Than I ran some ORDER BY clauses to see what performance differences I would get, plannign afterwards to see how changing values of a would affect things.

SELECT *
FROM [EMEA\BanksD].[CompColIndexing]
ORDER BY a

SELECT *
FROM [EMEA\BanksD].[CompColIndexing]
ORDER BY nonIndexedNonPersisted

SELECT *
FROM [EMEA\BanksD].[CompColIndexing]
ORDER BY IndexedNonPersisted

SELECT *
FROM [EMEA\BanksD].[CompColIndexing]
ORDER BY nonIndexedPersisted

SELECT *
FROM [EMEA\BanksD].[CompColIndexing]
ORDER BY IndexedPersisted

Unexpectedly though, I find that I get exactly the same result for each of the queries:

I was at least expecting the SORT operation on the first query to be slower as that one is unindexed.

What's happening here?

The cardinality is low on purpose, in reality, I actually need to sort on three different values.

I'm using Microsoft SQL Server 2008 R2 (SP2) - 10.50.4042.0 (X64)

Actual execution plans available at: https://www.brentozar.com/pastetheplan/?id=S10MTuxGg

Best Answer

The table definition that you have is leading to some really odd optimizer behavior. I suspect that you're running into the issue documented in this SE post. To avoid that issue I'm going to create the table with just the [a] and the [IndexedPersisted] columns.

Query hints can be useful to figure out why the optimizer didn't pick the plan that you expected. Here you expected the index to be used but SQL Server did not use it. Let's view both query plans side by side:

SELECT *
FROM [dbo].[CompColIndexing]
ORDER BY [IndexedPersisted]
OPTION (MAXDOP 1);

SELECT *
FROM [dbo].[CompColIndexing] WITH (INDEX([IX_DJB_CompPersisted]))
ORDER BY [IndexedPersisted]
OPTION (MAXDOP 1);

The query optimizer thinks that the sort after the table scan is cheaper than 800000 RID lookups. Maybe it's wrong, so let's run the queries and compare performance metrics for them.

cpu_time    total_elapsed_time  logical_reads   reads
1204        1395                3087            1485
1516        1570                851798          0

I got those numbers by looking at sys.dm_exec_sessions after running the queries in separate sessions with "discard results after execution" checked so I wouldn't have to wait on the rows to be returned to the client.

Those numbers seem reasonable to me. Just because an index can be used does not mean that it should be used, especially if SQL Server will need to read the entire table by using the index. That's the worst use case for an index that I can think of. Indexes can be very useful when selecting a small percentage of rows from the table or when they are covering indexes.

The index is a covering one if I only select the [IndexedPersisted] column. In that case, SQL server thinks that using the index is cheaper than doing a table scan. Code to compare the two methods:

-- force table scan
SELECT [IndexedPersisted]
FROM [dbo].[CompColIndexing] WITH (INDEX(0))
ORDER BY [IndexedPersisted]
OPTION (MAXDOP 1);

SELECT [IndexedPersisted]
FROM [dbo].[CompColIndexing] WITH (INDEX([IX_DJB_CompPersisted]))
ORDER BY [IndexedPersisted]
OPTION (MAXDOP 1);

Here are the performance numbers:

cpu_time    total_elapsed_time  logical_reads   reads
1171        1212                2727            1088
343         406                 1798            0

Now that the index is a covering index it is a better access path for the table than a table scan.

Related Solutions

Postgresql – Indexing to improve performance of range queries

If you have an index on created then the planner will need to choose between an using that index or the PK (or a full table scan) - it will not benefit from both at the same time.

--EDIT

As pointed out by @jug in the comments below, this is not accurate at least since 8.1: the planner may choose to build two in-memory bitmaps and combine them to get the result set. This gets more expensive as the tables get bigger, so the planner may choose not to do this depending on the size of the table and the estimated cost of using one index and then filtering.

--END EDIT

The new index will only be helpful if in some cases using it is more efficient than access via the PK. The kind of things that could make this likely include:

A large number of (...) in SELECT * FROM foo WHERE foo_id IN (...) AND created > 1234 AND created <= 6789
A small range, eg created > 6780 AND created <= 6790

Unless one or both is likely to happen, you should not create the secondary index - if they might, it would be best to test each scenario with and without the index to see if any performance benefit is worth the cost (eg increased storage and overhead for insert and update operation)

Indexing from start or when performance problem arises

Should I start indexing right from the start or when performance problem arises?

Indexing strategy tends to evolve as usage patterns emerge. That said, there are also strategies and design guidelines that can be applied up front.

Choose a good clustering key. You can usually determine the appropriate clustered index at design time, based on the expected pattern of inserts to a table. If a compelling case emerges for a change in the future, so be it.
Create your primary and other unique constraints. These will be enforced by unique indexes.
Create your foreign keys and associated non-clustered indexes. Foreign keys are your most frequently referenced join columns, so index them from the start.
Create indexes for any obviously highly selective queries. For query patterns you already know will be highly selective and likely to use lookups rather than scans.

Beyond the above, take a gradual and holistic approach to implementing new indexes. By holistic, I mean assess the potential benefit and impact to all queries and existing indexes when evaluating an addition.

A not uncommon problem in SQL Server circles is overindexing, as a result of guidance from the missing index DMVs and SSMS hints. Neither of these tools evaluate existing indexes and will merrily suggest you create a new 6 column index rather than add a single column to an existing 5 column index.

-- If you have this
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
)

-- But your query would benefit from the addition of a column
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

-- SSMS will suggest you create this instead
CREATE NONCLUSTERED INDEX [IX_MyTable_AnotherIndexWithTheSameColumnsAsTheExistingIndexPlusCol6] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

Kimberly Tripp has some excellent material on indexing strategy that while SQL focused is applicable to other platforms. For the SQL Server folk, there are some handy tools for identifying duplicates like the example above.

We can also create temporary index while executing a query. What are the pros and cons of such techniques?

This usually only applies for rarely run queries, typically ETL. You need to assess:

Does the time taken to create the index reduce the execution time of the query.
Does the maintenance overhead of leaving the index in place outweigh the time taken to create/drop when it's needed.

Best Answer

Related Solutions

Postgresql – Indexing to improve performance of range queries

Indexing from start or when performance problem arises

Related Question